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Abstract 

Dynamic  changes  in  complex,  real-time  environments,  such 
as  modern  video  games,  can  violate  an  agent’s  expectations. 
We  describe  a  system  that  responds  competently  to  such 
violations  by  changing  its  own  goals,  using  an  algorithm 
based  on  a  conceptual  model  for  goal  driven  autonomy.  We 
describe  this  model,  clarify  when  such  behavior  is 
beneficial,  and  describe  our  system  (which  employs  an  HTN 
planner)  in  terms  of  how  it  partially  instantiates  and 
diverges  from  this  model.  Finally,  we  describe  a  pilot 
evaluation  of  its  performance  for  controlling  agent  behavior 
in  a  team  shooter  game.  We  claim  that  the  ability  to  self¬ 
select  goals  can,  under  some  conditions,  improve  plan 
execution  performance  in  a  dynamic  environment. 

Introduction 

Al  researchers  have  repeatedly  acknowledged  the  limiting 
assumptions  of  classical  algorithms  for  automated  planning 
(Ghallab  et  al.,  2004;  Nau,  2007).  For  example,  they 
assume  static  environments  (where  changes  in  the 
environment  are  due  only  to  the  planner’s  actions),  off-line 
planning  (where  the  planner  does  not  monitor  execution), 
and  that  all  goals  are  fixed/unchanging.  Naturally,  these 
assumptions  do  not  always  apply.  For  example,  team 
shooter  games  are  dynamic  environments  that  are 
populated  by  multiple  agents  resulting  in  exogenous 
events.  Also,  the  agents  must  perform  online  planning  by 
executing  their  plans  during  the  game.  Finally,  the  goals  of 
the  game  change  as  the  game  state  changes  (e.g.,  if  a  win  is 
infeasible,  then  the  agent  should  attempt  to  gain  a  draw). 

We  present  a  system,  GDA-HTNbots,  which  reasons 
about  the  events  occurring  in  its  environment,  changes  its 
own  goals  in  response  to  them,  and  replans  to  satisfy  these 
changed  goals.  To  do  this,  GDA-FITNbots  constantly 
monitors  its  environment  for  unexpected  changes  and 
dynamically  formulates  a  new  goal  when  appropriate. 

In  this  paper,  we  briefly  summarize  related  research, 
review  a  conceptual  model  for  goal  driven  autonomy 
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(GDA)  (Klenk  et  al.,  2010),  describe  GDA-HTNbots  as  a 
partial  instantiation  of  this  model,  and  present  a  limited 
empirical  study  of  its  performance.  The  results  support  our 
primary  claim:  agent  performance  in  a  team  shooter 
domain  with  exogenous  events  can,  for  some  conditions, 
be  improved  through  appropriate  self-selection  of  goals. 

Related  Work 

Plan  generation  is  the  problem  of  generating  a  sequence  of 
actions  that  transform  an  initial  state  into  some  desired 
state  (Ghallab  et  al.,  2004).  GDA-HTNbots  controls  plan 
generation  in  two  ways:  first,  it  determines  when  the 
planner  must  start  working  on  a  new  goal.  Second,  it 
determines  what  goal  the  planner  should  attempt  to  satisfy. 

A  considerable  amount  of  research  exists  on  relaxing  the 
assumptions  of  classical  planning.  For  example, 
contingency  planning  permits  dynamic  environments 
(Dearden  et  al.,  2003).  Agents  that  use  this  approach  create 
a  plan  that  assumes  the  most  likely  results  for  each  action, 
and  generate  contingency  plans  that,  with  the  help  of 
monitoring,  are  executed  only  if  a  plan  execution  failure 
occurs  at  some  anticipatable  point) s). 

Another  assumption  of  classical  planning  concerns  the 
set  of  goals  that  the  agent  is  trying  to  achieve.  If  no  plan 
exists  from  the  initial  state  that  satisfies  the  given  goals, 
then  classical  planning  fails.  Partial  satisfaction  planning 
relaxes  this  all-or-nothing  constraint,  and  instead  focuses 
on  generating  plans  that  achieve  some  “best”  subset  of 
goals  (i.e.,  the  plan  that  gives  the  maximum  trade-off 
between  total  achieved  goal  utilities  and  total  incurred 
action  cost)  (van  den  Briel  et  al.,  2004). 

While  these  approaches  each  relax  an  important 
assumption  of  classical  planning,  neither  addresses  how  to 
respond  to  unexpected  events  that  occur  during  execution. 
One  straightforward  solution  is  incremental  planning, 
which  plans  for  a  fixed  time  horizon.  After  plan  execution, 
these  planners  then  generate  plans  for  the  next  horizon. 
This  process  iterates  until  the  goal  state  is  reached.  Another 
approach  is  dynamic  replanning,  which  monitors  the  plan’s 
execution.  If  it  is  apparent  that  the  plan  will  fail,  the 
planner  will  replan  from  the  current  state.  For  example, 
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HOTRiDE  (Ayan  et  al.,  2007)  employs  this  strategy  for 
non-combatant  evacuation  planning.  These  approaches  can 
also  be  combined.  For  example,  CPEF  (Myers,  1999) 
incrementally  generates  plans  to  achieve  air  superiority  in 
military  combat  and  replans  when  unexpected  events  occur 
during  execution  (e.g.,  a  plane  is  shot  down). 

Flowever,  these  approaches  do  not  perform  goal 
formulation ;  they  continue  trying  to  satisfy  the  current 
goal,  regardless  of  whether  their  focus  should  dynamically 
shift  towards  another  goal  (due  to  unexpected  events). 

Fortunately,  some  other  recent  research  has  addressed 
this  topic.  For  example,  Coddington  and  Luck  (2003) 
bestowed  agents  with  motivations,  which  formulate  goals 
in  response  to  thresholds  on  specific  state  variables  (e.g.,  if 
a  rover’s  battery  charge  falls  below  50%,  then  a  goal  of  full 
battery  charge  will  be  formulated  (Meneguzzi  &  Luck, 
2007)).  Flere  we  adopt  an  alternative  rule-based  approach 
whose  antecedents  can  match  to  complex  games  states. 

Research  on  game  AI  takes  a  different  approach  to  goal 
formulation  in  which  specific  states  lead  directly  to 
behaviors  (i.e.,  sequences  of  actions).  This  approach  is 
implemented  using  behavior  trees,  which  are  prioritized 
topological  goal  structures  that  have  been  used  in  HALO  2 
and  other  high  profile  games  (Champandard,  2007). 
Behavior  trees,  which  are  restricted  to  fully  observable 
environments,  require  substantial  domain  engineering  to 
anticipate  all  events.  GDA  can  be  applied  to  partially 
observable  environments  by  using  explanations  that 
provide  additional  context  for  goal  formulation. 

Cox’s  (2007)  work  inspired  the  conception  of  GDA, 
with  its  focus  on  integrated  planning,  execution,  and  goal 
reasoning.  We  extend  this  concept  here  in  multiple  ways, 
including  by  embedding  it  in  the  context  of  adversarial 
gaming  environments. 

Goal  Driven  Autonomy 

Goal-driven  autonomy  is  a  process  for  online  planning  in 
autonomous  agents  (Klenk  et  al.,  2010).  Figure  1  illustrates 
how  GDA  extends  Nau’s  (2007)  model  of  online  planning. 
The  GDA  model  primarily  expands  and  details  the  scope 
of  the  Controller,  which  interacts  with  a  Planner  and  a 
State  Transition  System  Z  (an  execution  environment).  We 
present  only  a  simplified  version  of  this  model,  and  our 
system  is  only  a  partial  implementation  of  this  model. 

System  Z  is  a  tuple  ( S,A,V,y )  with  states  .S',  actions  A, 
exogenous  events  V,  and  state  transition  function  y: 
Sx(AkjV)— >2s,  which  describes  how  the  execution  of  an 
action  or  the  occurrence  of  an  event  transforms  the 
environment  from  one  state  to  another.  For  example,  given 
an  action  a  in  state  s„  y  returns  the  updated  state  si+1. 

The  Planner  receives  as  input  a  planning  problem 
( Mr,sc,gc ),  where  Ms  is  a  model  of  Z  (the  environment),  sc 
is  the  current  state,  and  gc  eG  is  a  goal  that  can  be  satisfied 
by  some  set  of  states  5gc5.  The  Planner  outputs  a  plan 
pc={Ac,Xc},  which  is  a  sequence  of  actions  A=[ac,...,ac+„\ 
paired  with  a  sequence  of  expectations  Xc=[xc,  ...xc+„]. 
Each  x,  gXc  is  a  set  of  state  constraints  corresponding  to 


Figure  1:  A  Conceptual  Model  for  Goal  Driven  Autonomy 

the  sequence  of  states  [.sc, . sc+„]  expected  to  occur  when 

executing  Ac  in  sc  using  Ms. 

The  Controller  sends  the  plan’s  actions  to  Z  and 
processes  the  resulting  observations.  The  GDA  model 
takes  as  input  initial  state  So,  initial  goal  g0,  and  Ms,  and 
sends  them  to  the  Planner  to  generate  a  plan  p0  consisting 
of  action  sequence  An  and  expectations  X0.  When  executing 
A (h  the  Controller  performs  the  following  four  knowledge- 
intensive  tasks,  which  distinguish  the  GDA  model: 

1.  Discrepancy  detection :  This  compares  the  observations 
sc  obtained  from  executing  action  ac.i  in  state  sc.j  with 
the  expectation  xceX  (i.e.,  it  tests  whether  any 
constraints  are  violated,  corresponding  to  unexpected 
observations).  If  one  or  more  discrepancies  DccJ)  is 
found,  then  they  are  given  to  the  following  function. 

2.  Explanation  generation :  Given  a  state  sc  and  a  set  of 
discrepancies  A,cD,  this  hypothesizes  one  or  more 
explanations  eceE  of  Dc’s  cause(s),  where  e  is  a  belief 
about  (possibly  unknown)  aspects  of  sc  or  Mz. 

3.  Goal  formulation :  This  creates  a  goal  gc&G  in  response 
to  a  set  of  discrepancies  Dc,  given  their  explanation 
eceE  and  the  current  state  sceS. 

4.  Goal  management'.  Given  a  set  of  existing/pending 
goals  G'czG  (one  of  which  may  be  the  focus  of  the 
current  plan  execution)  and  a  new  goal  gc&G,  this  may 
update  G'  to  create  G"  (e.g.,  by  adding  gc  and/or 
deleting/modifying  other  pending  goals)  and  will  select 
the  next  goal  g'&G"  to  be  given  to  the  Planner.  (It  is 
possible  that  g=g'.) 

GDA  makes  no  commitments  to  specific  types  of 
algorithms  for  the  highlighted  tasks,  and  treats  the  Planner 
as  a  black  box.  This  description  of  GDA’s  conceptual 
model  is  necessarily  incomplete  due  to  space  constraints. 
For  example,  it  does  not  describe  the  reasoning  models 
used  by  Tasks  1-4  (each  of  which  may  perform  substantial 
inferencing)  nor  how  they  are  obtained,  it  assumes  multiple 
plans  are  not  simultaneously  executed,  and  it  does  not 


address  goal  management  issues  such  as  goal  prioritization 
or  goal  transformation  (Cox  &  Veloso,  1998).  However, 
this  description  should  suffice  to  frame  the  general  model, 
which  we  use  to  implement  our  system,  GDA-HTNBots. 

The  Domination  Game  (DOM) 

In  this  paper,  we  describe  a  simple  GDA  system  and  its 
application  to  controlling  a  set  of  agents’  actions  in  a  team 
shooter  game,  called  DOM.  In  DOM  games,  two  teams  (of 
multiple  agents)  compete  over  specific  locations  in  the 
game  world  called  domination  locations.  A  team  receives  a 
point  for  every  five  seconds  that  each  domination  location 
remains  under  its  control  (i.e.,  when  the  only  agents  in  that 
location  are  members  of  their  team).  The  game  is  won  by 
the  first  team  to  score  a  pre-specified  number  of  points. 

Domination  games  are  interesting  because  individual 
deaths  have  no  direct  impact  on  the  final  score;  any  agent 
that  is  killed  will  continue  playing  after  a  short  pause, 
starting  in  a  new  location  (this  is  called  respawning).  This 
allows  for  an  overall  strategy  and  organization  to  have  a 
large  impact  on  game  play. 

DOM  is  a  good  test  for  GDA  systems  because  its 
environment  is  dynamic:  the  world  changes  due  to  the 
opponent’s  actions,  which  our  system  cannot  predict.  Also, 
some  actions  in  the  game  are  non-deterministic  (e.g., 
adjudicating  when  members  of  two  teams  shoot  each 
other).  Hence,  our  system  must  react  to  unexpected  events 
and  dynamically  generate  new  plans  that  satisfy  different 
goals.  Manually  engineering  these  plans  a  priori  is 
difficult,  and  infeasible  for  suitably  complex  task 
environments.  GDA  removes  the  need  to  create  these 
beforehand  by  providing  an  agent  with  the  ability  to  reason 
about  its  goals  and  dynamically  determine  which  to  pursue. 

A  Goal  Driven  Autonomy  System 

GDA-HTNbots  is  an  extension  of  HTNbots  (Hoang  et  ah, 
2005)  in  which  the  Controller  performs  the  four  tasks  of 
the  GDA  model.  HTNbots  uses  SHOP  (Nau  et  al.,  1999)  to 
generate  game -playing  strategies  for  DOM  based  on  an 
external  hierarchical  task  network  (HTN).  These  strategies 
are  designed  to  control  a  majority  of  the  domination 
locations  in  the  game  world.  Whenever  the  situation 
changes  (i.e.,  when  the  owner  of  a  domination  location 
changes),  HTNbots  generates  a  new  plan.  Therefore, 
HTNbots  is  a  dynamic  replanning  system.  It  calls  SHOP  to 
find  the  first  method  that  is  applicable  to  a  given  task,  and 
uses  it  to  generate  subtasks  that  are  recursively 
decomposed  by  other  methods  into  a  sequence  of  actions  to 
be  executed  in  the  environment. 

Unlike  HTNbots,  GDA-HTNbots  reasons  about  its 
goals,  and  can  dynamically  formulate  which  goal  it  should 
plan  to  satisfy.  GDA-HTNbots  extends  HTNbots  to 
instantiate  the  GDA  conceptual  model  as  follows: 

State  Transition  System  (2)  (task  environment):  We  apply 
GDA-HTNbots  to  the  task  of  controlling  an  agent  playing 


DOM.  We  described  this  task  and  game  environment  in  the 
preceding  section,  and  describe  an  example  of  this 
application  in  the  next  section. 

Model  of  the  State  Transition  System  (Mf):  We  describe 
the  state  transition  function  for  DOM  using  SHOP  axioms 
and  operators.  Exogenous  events  are  not  directly  modeled 
in  SHOP.  HTNbots  play  DOM  by  monitoring  the  game 
state  and  replan  as  needed. 

Planner :  GDA-HTNbots  uses  SHOP,  although  other 
planners  can  be  used.  Given  the  current  state  sc  (initially 
s0),  current  goal  gc  (initially  go),  and  ,VA,  it  will  generate  an 
HTN  plan  pc  designed  to  achieve  gc  when  executed  in  E 
starting  in  sc.  This  plan  includes  the  sequence  of 
expectations  A,  determined  by  the  HTN’s  methods  that  are 
anticipated  from  its  execution. 

Discrepancy  Detector.  This  continuously  monitors  pf  s 
execution  in  E  such  that,  at  any  time  t.  it  compares  the 
observations  of  state  s,  provided  by  E  with  the  expected 
state  xt.  If  it  detects  any  discrepancy  dt  (i.e.,  a  mismatch ) 
between  them,  then  it  outputs  d,  to  the  Explanation 
Generator. 

Explanation  Generator.  Given  a  discrepancy  dt  for  state  s„ 
this  generates  an  explanation  e,  of  d,.  GDA-HTNbots  tracks 
the  history  of  the  game  by  counting  the  number  of  times 
agents  from  the  opposing  team  have  visited  each  location. 
Using  this  information  and  the  discrepancy  d„  GDA- 
HTNbots  identifies  an  explanation  et,  which  is  the  strategy 
that  the  opponent  is  pursuing. 

Goal  Formulator:  Given  an  explanation  e,  representing  the 
opponent’s  current  strategy,  GDA-HTNbots  formulates  a 
goal  g,  using  a  set  of  rules  of  the  form: 
if  e  then  g 

The  new  goal  g,  directs  GDA-HTNbots  to  counter  the 
opponent’s  strategy. 

Goal  Manager :  GDA-HTNbots  employs  a  trivial  goal 
management  strategy.  Given  a  new  goal  gt,  it  immediately 
selects  this  as  the  current  goal,  which  the  Controller 
submits  to  the  Planner  for  plan  generation. 

Example  in  the  DOM  Game 

In  this  paper,  we  report  on  a  case  study  in  which  the 
system’s  task  is  to  control  a  team  of  agents  in  DOM. 
Figure  2  shows  an  example  of  a  map  in  a  domination  game 
with  five  locations.  Our  scenario  began  with  the  following 
initial  state  and  goal: 

Initial  State  (s0):  This  includes  the  locations  of  all  the 
agents  in  the  game  and  which  team  (if  any)  controls  each 
domination  location. 

Initial  Goal  (g0):  The  initial  goal  is  to  win  the  game  (i.e., 
be  the  first  to  accumulate  20,000  points). GDA-HTNbots 
sends  g0  to  SHOP,  which  generates  a  plan  to  dispatch 
GDA-HTNbots’  agents  to  each  domination  location  and 
control  them.  Given  the  uncertainties  about  the  opponent’s 
actions  and  the  stochastic  outcome  of  engagements,  this 


Figure  2:  An  example  DOM  game  map  with  five  domination 
locations  (yellow  flags),  where  small  rectangles  identify  the 
respawning  locations  for  the  agents  and  the  remaining  two  types 
of  icons  denote  each  player’s  agents. 


Table  1:  Example  explanations  of  discrepancies  (with  some 
expectations  and  observations  shown),  and  the  corresponding 
recommended  goals. 


Discrepancy 

Explanation 

Next  Goal 

x, :  Loc(bot3,loc2) 
s,:  -iLoc(bot3,loc2) 

Defended(loc2) 

Loc(bot3,locl) 

x OwnPts(t)>EnemyPts(t) 
st:  OwnPts(t)<EnemyPts(t) 

EnemyCtrl(locl) 

EnemyCtrl(loc2) 

OwnCtrl(loc2) 

plan  may  not  yield  the  expected  results.  For  example, 
Table  1  presents  some  sample  explanations  for  the  DOM 
game  (we  do  not  display  the  full  state  due  to  space 
constraints).  The  first  row  highlights  a  situation  where  the 
bot3  agent  was  expected  to  be  at  location  2,  but  this  did  not 
happen.  By  examining  the  history  of  enemy  agents  at  that 
location,  GDA-HTNbots  assumes  the  opponent  is 
executing  a  strategy  to  heavily  defend  location  2.  Using  the 
explanation  goal  rule  set,  GDA-HTNbots  counters  this 
strategy  by  setting  a  goal  to  have  bot3  at  an  alternative 
location,  namely  location  1. 

The  second  row  shows  a  discrepancy  where  GDA- 
HTNbots  expected  to,  over  the  last  time  period  t,  earn  more 
points  than  the  enemy.  However,  this  did  not  happen 
because  the  enemy  controlled  two  of  three  (total)  locations. 
The  rule  set  determines  that  the  next  goal  should  be  to 
control  one  of  the  locations  controlled  by  the  opponent 
(e.g.,  loc2).  Given  this,  our  system  generates  a  plan  to  send 
two  agents  to  location  2. 

This  example  illustrates  how  GDA-HTNbots  explains 
discrepancies  by  reasoning  about  the  opponent’s  strategies. 
This  enables  GDA-HTNbots  to  formulate  goals  that 
counter  the  opponent’s  actions. 


Table  2:  Domination  Teams  and  Descriptions 


Opponent 

Team 

Description 

Diff. 

Doml  Hugger 

Sends  all  agents  to  domination  location  0 

trivial 

First  Half  Of 
Dom  Locations 

Sends  an  agent  to  the  first  half  +  1 
domination  locations.  Extra  agents  patrol 
between  the  2  locations 

easy 

2nd  Half  Of 

Dom  Locations 

Sends  an  agent  to  the  second  half  +1 
domination  locations;  extra  agents  patrol 
between  the  two  locations 

easy 

Each  Agent  to 
One  Dom 

Each  agent  is  assigned  to  a  different  dom. 
loc.  and  remains  there  for  the  entire  game 

med.- 

easy 

Smart 

Opportunistic 

Sends  agents  to  each  dom.  loc.  the  team 
doesn’t  own;  if  possible,  it  will  send 
multiple  agents  to  each  unowned  location 

hard 

Greedy 

Distance 

Each  turn  the  agents  are  assigned  to  the 
closest  domination  loc.  they  do  not  own 

hard 

Pilot  Evaluation 

We  conducted  a  pilot  study  to  assess  the  performance 
utility  of  HTNbots’  GDA  enhancements.  We  claim  that 
GDA  increases  our  system’s  ability  to  win  DOM  games 
versus  a  set  of  opponents.  To  test  this  claim,  we  performed 
an  ablation  study  that  isolates  the  GDA  functionality. 

In  particular,  we  compared  the  performances  of  GDA- 
HTNbots  and  HTNbots.  Hoang  et  al.  (2005)  and  Munoz- 
Avila  and  Hoang  (2006)  reported  that  HTNbots  performs 
well  versus  several  hard-coded  opponents.  Thus,  HTNbots 
should  provide  a  good  baseline  for  our  evaluation. 
However,  we  expected  GDA-HTNbots  would  outperform 
HTNbots  for  opponents  whose  behaviors  motivate  the 
dynamic  formulation  of  new  goals. 

We  recorded  and  compared  the  performance  of  these 
systems  versus  the  same  set  of  hard-coded  opponents.  Our 
performance  metric  is  the  difference  in  the  score  between 
the  system  and  opponent  while  playing  DOM,  divided  by 
the  system’s  score. 

We  ran  both  systems  against  each  of  the  six  opponents 
summarized  in  Table  2.  The  first  three  were  the  same  used 
by  Hoang  et  al.  (2005)  to  test  HTNbots,  which  was  found 
to  perform  well  on  them.  Hence,  these  are  challenging 
DOM  opponents  for  testing  whether  GDA  enhancements 
can  improve  HTNbots’  performance.  The  final  three 
opponents  were  created  in  subsequent  studies  of  HTNbots 
to  test  reinforcement  learning  (Smith  et  al.,  2007)  and 
case-based  reasoning  (Auslander  et  al.,  2008)  algorithms. 
Among  these,  the  final  two  opponents  were  found  to  be 
particularly  difficult  to  beat.  In  summary,  these  opponents 
form  a  challenging  and  varied  testbed  to  measure  the  utility 
of  GDA-HTNbots. 

The  experimental  setup  was  as  follows:  Both  systems 
were  tested  versus  each  of  these  opponents  on  the  map 
shown  in  Figure  2.  This  is  the  same  map  that  was  used  in 
the  previously  mentioned  experiments.  Each  game  was  run 
three  times  to  account  for  the  randomness  introduced  by 
non-deterministic  game  behaviors. 


Table  3:  Avg.  Percent  Normalized  Difference  in  Game  AI 
System  vs.  Opponent  Scores  (with  avg.  scores  in  parentheses) 


Opponent  Team 

(controls  enemies) 

Game  AI  System  (controls  friendly  forces) 

HTNbots 

GDA-HTNbots 

Doml  Hugger 

81.2%) 

(20,002  vs.  3,759) 

80.9% 

(20,001  vs.  3,822) 

First  Half  Of  Dom 
Locations 

47.6% 

(20,001  vs.  10,485) 

42.0% 

(20,001  vs.  11,605) 

2nd  Half  Of  Dom 
Locations 

58.4% 

(20,003  vs.  8,318) 

12.5% 

(20,001  vs.  17,503) 

Each  Agent  to  One 
Dom 

49.0% 

(20,001  vs.  10,206) 

40.6% 

(20,002  vs.  11,882) 

Smart  Opportunistic 

-19.4% 

(16,113  vs.  20,001) 

-4.8% 

(19,048  vs.  20,001) 

Greedy  Distance 

-17.0% 

(16,605  vs.  20,001) 

0.4% 

(19,614  vs.  19,534) 

(Bold  face  denotes  the  better  average  measure  in  each  row 


The  results  are  shown  in  Table  3,  where  each  row 
displays  the  normalized  average  difference  in  scores 
(computed  over  three  games)  versus  each  opponent.  It  also 
shows  the  average  scores  for  each  player.  We  repeated  the 
same  experiment  with  a  second  map  and  obtained  results 
consistent  with  the  ones  discussed  here.2  The  limited 
number  of  trials  in  this  pilot  study  prevents  us  from 
computing  statistical  significance.  Therefore,  we  focus  our 
discussion  on  general  trends  and  game  analysis. 

Discussion 

The  results  can  be  summarized  as  follows:  Against  difficult 
opponents  (the  final  two  opponents  in  Table  2),  GDA- 
HTNbots  outperforms  HTNbots.  Against  easy  opponents 
(the  first  four  listed  in  Table  2)  HTNbots  outperforms 
GDA-HTNbots.  We  examined  game-play  records  to 
investigate  why  this  occurred,  and  concluded  that  the 
initial  strategy  chosen  by  HTNbots  is  frequently  sufficient 
to  win  the  game.  For  example,  the  Doml  Hugger 
(opponent)  team  sends  all  agents  to  one  location.  It  is  easy 
for  HTNbots  to  immediately  generate  a  winning  plan 
against  this  strategy  and  start  winning  from  the  outset. 
Indeed,  in  situations  where  the  goals  should  not  be 
changed,  this  implementation  of  GDA  should  not  be  used. 

The  more  difficult  opponents  reason  about  the  distance 
between  the  agent  locations  and  the  domination  locations 
as  part  of  their  strategy.  These  strategies  are  particularly 
effective  versus  HTNbots  and  GDA-HTNbots,  which 
encode  their  knowledge  symbolically  without  metric 
information.  Indeed,  the  two  hard  opponents  soundly 
defeat  HTNbots.  The  advantage  of  using  a  specialized 
component  to  reason  about  goals  becomes  apparent  in  this 
study.  By  tracking  which  domination  locations  the 
opponent  is  trying  to  control  and  which  goal  was  used  to 
generate  the  current  plan,  GDA-HTNbots  can  react  quickly 
to  the  opponent’s  strategy.  This  allowed  GDA-HTNbots  to 


2  For  the  second  map  we  didn’t  obtain  results  for  greedy 
distance  because  of  some  path  finding  issues. 


outperform  the  Greedy  Distance  opponent  (which 
outperformed  HTNbots)  and  almost  perform  as  well  as  the 
Smart  Opportunistic  opponent. 

GDA-HTNbots  is  simple  implementation  of  the  four 
GDA  tasks.  In  future  research,  we  plan  to  study  other 
methods  for  each  of  these  tasks.  First,  during  discrepancy 
detection,  GDA-HTNbots  observes  discrepancies  only 
between  discrete  variables.  A  more  complete  system  would 
consider  the  continuous  attributes  of  the  environment  (e.g., 
precise  agent  locations,  health,  and  score).  It  would  also 
represent  and  reason  about  relations  among  these 
attributes.  Second,  the  explanation  generation  process 
allows  the  system  to  consider  reasons  for  a  given 
discrepancy.  In  this  paper,  our  system  does  not 
dynamically  derive  the  explanation  using  a  comprehensive 
reasoning  mechanism,  and  it  only  considers  the  opponent's 
strategy.  In  open  domains,  it  is  important  to  hypothesize 
new  entities  and  events  that  were  not  represented  in  Mr. 
This  capability  would  substantially  diverge  from  current 
approaches  to  online  planning.  Third,  during  goal 
formulation,  GDA-HTNbots  uses  simple  rules  to  map 
explained  discrepancies  to  specific  goals.  However,  not  all 
discrepancies  require  goal  formulation  (e.g.,  some  should 
be  ignored).  A  more  complete  system  should  reason  about 
alternative  responses  to  detected  discrepancies,  such  as  by 
reasoning  about  game  state  information.  This  is  the  reason 
why  GDA-HTNbots  underperformed  versus  the  easier 
opponents  compared  to  HTNbots;  if  GDA-HTNbots  had  a 
more  sophisticated  Discrepancy  Resolver,  it  could  reason 
that  a  given  discrepancy  does  not  warrant  goal  formulation; 
in  which  case  it  would  continue  pursuing  the  same  goal  as 
selected  by  HTNbots  and,  hence,  would  achieve  the  same 
performance  as  HTNbots.  Finally,  GDA-HTNbots 
performs  only  goal  replacement  during  goal  management. 
This  was  effective  for  DOM  because  it  does  not  require 
balancing  various  goals  and  focuses  solely  on  defeating  an 
individual  opponent.  In  more  complex  environments,  a  list 
of  pending  goals  must  be  maintained,  and  the  system  will 
need  to  consider  which  goal(s)  to  pursue  at  any  given  time 
by  reasoning  about  resources,  tradeoffs,  and  priorities. 

In  future  work,  we  plan  to  run  more  trials,  which  will 
allow  us  to  statistically  analyze  our  claims.  In  addition,  we 
will  investigate  other  methods  for  the  four  GDA  tasks.  We 
also  envision  ways  in  which  the  background  knowledge 
used  in  the  GDA  process  can  be  learned  automatically. 
Finally,  we  will  examine  the  behavior  of  GDA  systems  for 
tasks  in  other  complex  domains  that  have  a  greater  need  to 
reason  dynamically  about  which  goals  to  satisfy  (e.g.,  the 
TAO  Sandbox  (Auslander  et  al.,  2009),  which  is  a 
simulator  used  to  train  Naval  officers  for  decision  making 
in  anti-submarine  warfare  missions). 

Final  Remarks 

Plan  generation  is  the  problem  of  generating  a  sequence  of 
actions  that  transform  the  initial  state  into  some  goal  state 
(Ghallab  et  al.,  2004).  Goal  driven  autonomy  (GDA) 
enhances  plan  generation  by  addressing  two  important 


questions.  First,  it  answers  when:  it  determines  when  the 
planner  should  start  working  on  a  new  goal.  Second,  it 
answers  what:  it  determines  which  goal  should  be  satisfied 
in  response  to  a  detected  discrepancy.  Our  initial 
implementation  of  GDA  illustrates  this  concept.  GDA- 
HTNbots  can  interrupt  the  execution  of  a  strategy  based  on 
an  earlier  goal,  and  instead  generate  and  execute  a  plan  that 
achieves  a  different  goal  that  now  has  higher  priority. 

Our  implementation  barely  scratches  the  surface,  as  it  is 
a  simple  instantiation  of  the  GDA  conceptual  model. 
However,  our  investigation  with  GDA-HTNbots  illustrates 
its  potential.  The  DOM  game,  when  played  against 
challenging  opponents,  has  all  of  the  elements  targeted  by 
GDA:  the  environment  is  dynamic,  it  requires  game  AI 
agents  to  conduct  online  planning,  and  it  must  constantly 
change  its  goals  to  perform  well  (versus  the  more 
challenging  opponents).  Furthermore,  the  baseline  system, 
HTNbots,  performed  well  on  this  DOM  task.  Hence,  GDA- 
HTNbots’  performance  improvement  versus  the  more 
challenging  opponents  is  encouraging,  and  we  look 
forward  to  assessing  a  more  complete  GDA  model’s 
performance  in  future  work. 
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